6 research outputs found

    Document Classification Systems in Heterogeneous Computing Environments

    Get PDF
    Datacenter workloads demand high throughput, low cost and power efficient solutions. In most data centers the operating costs dominates the infrastructure cost. The ever growing amounts of data and the critical need for higher throughput, more energy efficient document classification solutions motivated us to investigate alternatives to the traditional homogeneous CPU based implementations of document classification systems. Several heterogeneous systems were investigated in the past where CPUs were combined with GPUs and FPGAs as system accelerators. The increasing complexity of FPGAs made them an interesting device in the heterogeneous computing environments and on the other hand difficult to program using Hardware Description languages. We explore the trade-offs when using high level synthesis and low level synthesis when programming FPGAs. Using low level synthesis results in less hardware resource usage on FPGAs and also offers the higher throughput compared to using HLS tool. While using HLS tool different heterogeneous computing devices such as multicore CPU and GPU targeted. Through our implementation experience and empirical results for data centric applications, we conclude that we can achieve power efficient results for these set of applications by either using low level synthesis or high level synthesis for programming FPGAs

    High Level Programming of FPGAs for HPC and Data Centric Applications

    No full text
    Heterogeneous computing offers a promising solution for high performance and energy efficient computing. Until recently the high performance heterogeneous computing arena was dominated by discrete GPUs but in recent years, new solutions based on devices such as APUs and FPGAs have emerged. These new solutions show promise for further improvements in energy efficiency. FPGA based heterogeneous computing is an especially promising direction since it allows for the creation of custom hardware solutions for data centric parallel applications. One of the main issues delaying wide spread adoption of FPGAs as main stream high performance computing devices is the difficulty in programming them. Altera's OpenCL implementation for FPGAs provides a high level of abstraction and increased ease of programmability of FPGAs. Two high performance computing applications (Lava Molecular Dynamics and Nearest-Neighbours) and a data centric application (Document Classification) were compiled using Altera's OpenCL compiler and programmed on a Nallatech FPGA board. Hardware utilization, kernel execution time and total execution time are reported. Up to 5.3x, 4.3x and 1.3x speed up over the Dual Xeon processor implementations was achieved respectively for LavaMD, Nearest-Neighbours and Document Classification

    High Level Programming of Document Classification Systems for Heterogeneous Environments using OpenCL (Abstract Only)

    No full text
    Document classification is at the heart of several of the applications that have been driving the proliferation of the internet in our daily lives. The ever growing amounts of data and the need for higher throughput, more energy efficient document classification solutions motivated us to investigate alternatives to the traditional homogenous CPU based implementations. We investigate a heterogeneous system where CPUs are combined with FPGAs as system accelerators. Incorporating FPGAs as accelerators in a heterogeneous computing environment allows for the creation of flexible custom hardware solutions that can potentially offer increased power efficiency and performance gains. One of the main issues delaying wide spread adoption of FPGAs as standard heterogeneous system accelerators is the difficulty in programming them. The OpenCL standard offers a unified C programming model for any device that adheres to its standards. An Altera OpenCL FPGA based implementation of a document classification system is investigated in which a stream of HTML documents is scored according to a profile on a document-by-document basis. The results show that the throughput of the document classification application with and without Bloom Filters is 312MB/s and 343MB/s respectively, when running on CPU, and 354MB/s and 452MB/s respectively, when running on an FPGA. Our results also show up to 32% power efficiency improvement for the FPGA implementation over the CPU implementation. We would like to thank Davor Capalija from Altera for his invaluable advice during our work on the FPGA version of the algorithm
    corecore